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On  Density  Estimation  from  Censored  Data 
by  Penalized  Likelihood  Methods 


Abstract 


’  Estimators  for  the  probability  density  function,  cumulative  distribution 
function,  and  hazard  function  are  proposed  in  the  random  censorship  setting. 
The  estimators  are  derived  from  the  Kaplan-Meier  product  limit  estimator  by 

/  X  £  %  v  * 

maximum  penalized  likelihood  methods.  Me-  establish  the  existence  and. 

uniqueness  of  the  estimates,  which  are  exponential  splines  with  knots  at 

* 

the  uncensored  observations,  and  provide  an  efficient  algorithm  for  their 
numerical  evaluation.  -We  prove  the  consistency,  in  probability  and  almost 

•  i  f' 

surely,  of  the  density  estimates  in  the  Hellinger  distance,  the  Lr  norms 
for  p  =  1,2,®,  and  the  Sobolev  norm.  The  corresponding  hazard  rate  estimator 
converges  uniformly  on  bounded  intervals.  S’- - _ 
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1.  Introduction 


The  classic  problem  in  the  independent  random  censorship  model  is  to 
estimate  the  distribution  function  nonpar ametrically.  The  maximum  likelihood 
estimator  is  the  well-known  product  limit  estimator  introduced  by  Kaplan 
and  Meier  (1958) .  We  propose  an  estimator  of  the  density  derived  from  the 
Kaplan-Meier  estimator  by  maximum  penalized  likelihood  techniques. 


For  uncensored  data,  the  maximum  penalized  likelihood  estimator  (MPLE) 

was  introduced  by  Good  and  Gaskins  (1971,  1980).  Let  X,  ,X_,...,X  be  i.i.d. 

x  2  n 

random  variables  from  a  distribution  F  with  density  f ,  and  let  F  denote 

n 

the  corresponding  empirical  distribution  function.  The  MPLE,  denoted  by  f  , 

n 

n 

maximizes  the  likelihood  II  f(X.)  over  a  space  of  "smooth"  functions. 

i=l  1 

(Requiring  smoothness  avoids  the  Dirac  delta  solution  of  the  unconstrained 


problem.)  Equivalently,  f  is  the  maximizer  of 


(1.1)  n  /  log  f  dFn  -  4>(f) 


subject  to  /f  =  1  and  f  >  0,  where  4>(f )  is  a  "roughness  penalty".  DeMontricher, 
Tapia,  and  Thompson  (1975)  rigorously  established  the  existence  and  uniqueness 
of  the  solution  f  within  the  framework  of  Sobolev  spaces,  and  showed  that  the 
resulting  MPLE  is  a  spline  function  with  knots  at  the  sample  points.  Silverman 
(1982)  proposed  and  studied  the  statistical  properties  and  asymptotic 
distribution  theory  of  a  class  of  estimators  with  roughness  penalties  on  log  f. 
Klonias  (1984)  obtained  existence,  uniqueness  and  consistency  results  for  a 
broad  class  of  penalty  functionals  on  f*3. 

In  the  censored  data  setting,  Lubecke  and  Padgett  (1984)  proposed 
estimating  the  density  f  by  the  maximizer  of  the  penalized  conditional  likeli¬ 
hood,  given  which  observations  were  censored.  Questions  of  evaluation  of  the 


estimator  and  consistency  were  not  addressed. 

We  propose  estimators  for  the  density,  distribution,  and  hazard  functions 

derived  by  maximum  penalized  likelihood  techniques.  These  estimators  are  based 

on  an  estimate  of  the  root-density  v=f>*  denoted  by  u^,  which  is  an  exponential 

spline  function  with  knots  at  the  uncensored  observations .  The  estimator 

ur  corresponds  to  the  "first  MPLE”  of  Good  and  Gaskins  (1971)  in  the  uncensored 

setting.  The  advantages  of  parameterizing  the  problem  through  the  root-density 

are  that  it  is  square-integrable,  conveniently  allowing  the  use  of  Hilbert 

space  methods,  and  avoids  the  nonnegativity  constraint  f  >  0,  while  providing 

the  same  density  estimator  as  the  direct  approach  -  for  the  same  penalty 

functional  -  when  the  MPLE  u  turns  out  to  be  nonnegative,  as  is  the  case 

n 

here;  see  Lemma  3.1  of  DeMontricher  et  al  (1975).  In  addition,  the  square 
root  transformation  is  a  variance  stabilizing  transformation  for  the  density 
estimation  problem,  so  that  a  global  roughness  penalty  seems  appropriate  to 
be  imposed  on  v  =  f*1  rather  than  f;  see  Tukey  (1972)  and  Good  and  Gaskins 
(1971,  1980).  We  then  equivalently  consider  v  as  the  parameter  of  the 
problem,  let  it  vary  over  an  appropriate  Hilbert  space  and  express  (1.1)  in 
terms  of  it  alone. 

Estimators  of  the  density  f,  distribution  function  F  and  hazard  rate  r 
are  derived  from  u^  by 


K .  .  -A 


I..J 


•r/v-'j 


f  (t)  =  u  (t) 
n  n 


Fn(t)  "  Vn<t)dt 


f  1 


and  r  (t)  =  f  (t)/[l  -  F  (t)]. 
n  n  n 


The  existence,  uniqueness  and  implicit  representation  of  u^  as  an 
exponential  spline  with  knots  at  the  uncensored  observations  are  derived  in 


-2- 


Section  2,  where  we  also  discuss  the  numerical  evaluation  of  the  estimator 
through  an  efficient  algorithm  of  Klonias  and  Nash  (1983  a,b)  based  on  a 
truncated  Newton  method  described  in  Nash  (1984) . 

He  establish  consistency  of  the  proposed  estimators  under  mild  moment 
and  smoothness  conditions.  We  rely  on  asymptotic  results  for  the  Kaplan- 
Meier  estimator  by  Gill  (1983)  for  consistency  in  probability,  and  by  Foldes 
and  Rejto  (1981)  for  almost  sure  consistency.  The  central  proposition 
establishes  consistency  of  f  in  the  Hellinger  distance,  i.e., 

II  un  "  fl,H2  “*‘0  as  n  00 

almost  surely  or  in  probability  under  suitable  conditions,  and  determines 

lower  bounds  on  the  rate  of  convergence  in  each  case.  Consequently,  we 

obtain  consistency  of  u^  in  the  supremum-norm  and  Sobolev  norm,  consistency 

of  the  density  estimator  in  the  supremum,  and  Sobolev  norms,  and, 

uniform  convergence  of  the  hazard  rate  estimator  r  on  bounded  intervals. 

n 

The  assumptions,  statements,  and  proofs  of  the  consistency  and  rate 
of  convergence  bounds  are  presented  in  section  3.  Auxiliary  lemmas,  which 
provide  bounds  on  integrals  needed  to  establish  consistency  in  the  Hellinger 
distance,  are  proved  in  section  4,  including  a  result  regarding  the  entropy 
of  continuous  distributions. 


2.  Formulation  of  the  Estimator 


The  Random  Censorship  Model:  Let  XlfX2,...#X  be  independent  positive 

random  variables  with  common  density  function  f  and  cumulative  distribution 

function  F.  Let  Yi'Y2'**"'Yn  be  independent  positive  random  variables, 

representing  censoring  times,  with  common  distribution  function  G  which 

may  be  discontinuous  or  defective.  The  random  variables  Y, ,Y  , . . . ,Y  are 

12  n 

assumed  to  be  independent  of  X_,X_,...,X  .  The  observations  are 

l  2  n 


{(Z^,6^):  i  =  1,2,..., n},  defined  by 


Z.  *=  X.  AY. 
x  xx 


and  6.  *»  l{x.  <  Y.  >, 
x  x  -  x 


where  A  denotes  minimum  and  i{a}  denotes  the  indicator  random  variable  of 
the  event  A.  Denote  the  distribution  function  of  {z^}  by  H,  which  is  given 


1  -  H  =■  (1-F)  (1-GJ . 


Define  T  =  sup{t:  F(t)  <  l},  with  T  and  T  defined  similarly. 


The  product-limit  estimator  Fr  is  given  by 


i  y><i> 


i  -  f  (t)  =  n  (i  — f-r) 

"  {i:  Z  .  <  t}  '  n'1+i; 


>.y.y 

y.v.v, 

•>  V  v  L 


where  Z,<Z„<...<Z  denote  the  ordered  observations  {z.},  and 
nl  -  n2  -  -  nn  x 

6  ,  ,6  6  denote  the  corresponding  indicators  {6.}.  The  Kaplan-Meier 

nl  n2  nn  x 

estimator  has  jumps  only  at  the  observations  for  which  6^*1,  which  are 

called  uncensored  observations.  There  are  a  random  number  N  of  uncensored 

n 

observations.  We  let  T,<T_<...<T„  denote  the  ordered  uncensored 

nl  -  n2  -  nN 

n 

observations,  and  let  w  ./n  denote  the  size  of  the  jump  of  F  at  T  . . 

ni  n  ni 
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The  Optimization  Problem;  In  the  censoreddata  setting,  (1.1)  suggests 


estimating  f  by  the  maximizer  f  of 


n  J  log  f  dF  -  (a/4)  J  (f*/f)2f 


subject  to:  /  f  =  1,  f  >  0, 


or  equivalently,  see  Lemma  (3.1)  of  DeMontricher  et  al  (1975),  by  =  f^, 
where  denotes  the  maximizer  of  the  following  optimization  problem: 


.*»  »* 


(2.1) 


{n  J  log  u2  dF  -  a  J  (u*  )2,  u  €  H (A) } 
n  'A 


,  2 

subject  to:  I  u  «=  1 
A 


and  u(T  .)  >  0,  1  <  i  <  N  , 

m  -  -  —  n 


where  a  >  0,  H (A)  =  {u  €  L2  (A) :  a’  £  L2  (A)}  and  consider  the  cases 
A  +  IR+,  A  =  R.  Incorporating  the  first  constraint  into  the  objective 
function,  we  consider  maximization  of 


£x(u)  -  n/A  log  u2dFn  -  aJA(u')2  -  X/Au2 

where  X  is  the  Lagrange  multiplier  corresponding  to  the  constraint.  Then 

the  solutions  u*1*,  u*2*  of  (2.1)  over  JR  and  IR,  respectively  are  given 
n  n  + 


implicitly  by 


(2.2) 


,-i  ,N» 


li.. i  "m  Vt'T»i,h>>  3  -  na. 


i  .*  j 

£3 


where  X  >  0  is  the  Lagrange  multiplier  associated  with  the  first  constraint, 
L 

h  =  (a/X)  and 


where , 


k^x.yjh)  =  h  e((x-y)/h), 

k2(x,y;h)  =  (2h)  He((x-y)/h)  +  e ( (x+y) /h) } , 


e(x)  =  exp{-|x|}/2,  x£lR. 


Note  that  k^,k2  are  the  }cernels  of  the  reproducing  kernel  Hilbert  spaces 
(RKHS)  H  (A) ,  endowed  with  the  inner  products  <uj*u2>  =  /A  uj,u2  +  U1U2 

for  A  =  JR,  R+  respectively.  The  parameter  h  plays  the  role  of  the  "band¬ 
width"  of  a  kernel  estimator  and  we  will  equivalently  use  h  rather  a  as  our 
smoothing  parameter.  For  the  consistency  results  of  Section  3  we  will  let 

h  depend  on  n,  i.e.,  hn  =  0(n  ^),  £  >  0.  Then  X,  which  also  depends  on  n, 

l-2f 

behaves  asymptotically  like  n,  so  that  =  0(n  ).  For  a  development  of 
the  consistency  of  the  MPLE’s  in  the  uncensored  case,  with  a  rather  than  h 
the  independent  parameter,  see  Klonias  (1982) . 

(2) 

In  the  remainder  of  the  paper,  u^  refers  specifically  to  ,  k=k^, 

A  =  JR  and  H  denotes  H(!R  ).  However,  arguments  applying  to  u ^  are  nearly 
+  +  n 

identical  and,  in  instances  when  they  differ,  are  slightly  simpler. 

Existence  and  Uniqueness: 

PROPOSITION  2.3:  Let  H_(A)  =  (u€H(A):  u(T  .)  >0,  i  =  1,2,.. .,N  }.  For 
-  0  ni  -  n 

each  X  >  0,  there  exists  a  unique  maximizer  of  (2.1)  in  Hq(A),  which  is  a 
spline  function  given  implicitly  by 


Uj,(t)  =  1  I 


1=1  X  m 


2 

Then  u  =  u.  ,  where  X  is  the  value  of  X  for  which  (|u.||  =  1. 


Proof.  The  proof  relies  on  Theorem  7  of  Appendix  I  of  Tapia  and  Thompson 

(1978).  The  set  H„(A)  =  {u€H:  u (T  .)  >  0,1  <  i  <  N  }  is  closed  and  convex. 
0  ni  -  -  —  n 

The  second  Gateaux  variation  of  £,^(u),  given  by 


v \(u)(n,h)  =-2{  l  w  u(t  .)  n(T  .)S(t  )  +  X<n,S>). 

A  .  ,  ni  ni  ni  ni 

1=1 
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2  2 

n,£  €  H,  is  uniformly  negative  definite:  ^  A^(u)(n»H)  5  “  2Mln||  *  To 

establish  the  existence  and  uniqueness  of  the  maximizer  of  i^(u)  over  HQ(A) 
by  Tapia  and  Thompson's  result,  it  suffices  to  show  that  is  continuous. 
Continuity  of  8^  follows  from 


Hull  -||u‘|| 


<  l|u-u‘|| 


and 


|u(T  . ) -u* (T  .)|  <  |  <  k(*,T  . ;h) ,U-U*  > 
1  ni  ni  1  -  1  nx 


-  llk(*'Tni;h)ll  HU“U*II- 


Note  that  the  constraints  u(T  .)  >0  cannot  be  active  at  the  maximum, 

ni  — 

so  the  stationary  point  of  the  Lagrangian  of  the  problem  satisfies 


V£ 


^(u)  (n)  =  o  V n  €h. 


Equivalently, 


N 


C  y  w  .  u(T  ,)-1  k(*  ,T  .  ;h)  -  Au,r>  >  -  0  Vn^H, 
nx  ni  "■» 


ni 


so 


N 


A  1  Y  w  .  u (T  .)  ^(*,1  )  -u([  =  0. 

u  ni  ni  "i  " 


i=l 


nx 


provides  the  form  of  the  maximizer  implicitly.  f_[ 


Numerical  Evaluation:  Since  u  is  determined  by  its  values  at  the  uncensored 
- n 

observations,  we  will  obtain  u  (T  by  first  solving  for 


q^  =  (Ah) 


u  (T  .) 
n  nx 


-1 


1  <  i 


<  N  . 
-  n 


Evaluating  the  implicit  form  of  u  at  {t  . }  ,  we  obtain  the  following  nonlinear 

n  ni 


systems  of  equations 


(2.3) 


[T1  =  L_.  w  k(T  .  ,T  ;h),  i  <  i  <  N  . 
1  “3=1  nj  m  nj  -  -  n 


-1  T 

Note  that  the  solution  =  (q^,...,qN  )  of  (2.3)  is  the  unique  solution 

n 

to  the  following  finite  dimensional  optimization  problem: 


T  ”n  2 

in{q  W  $  Wq  -  l  w  log  q.  ,  q€  IR  n}( 
i=l  ni  *  - 


man 


where  W  =  diag{w  ,  ...,w  }  and  the  (i,j)-th  entry  of  the  positive  definite 

tii.  hn 

n 

matrix  $  is  given  by  k(T^,Tnj  ;h) .  Existence  and  uniqueness  of  the  solution 
to  this  problem  are  discussed  in  Klonias  and  Nash  (1983) ,  who  present  an 
efficient  algorithm  for  the  numerical  evaluation  of  q.  The  parameter  X  is 

then  obtained  from  the  equation  /  =  1,  i.e., 

N  N 
n  n 


X  =  T  l  q.q .  (k*k)  (T  . fT  .?h), 
L.  ,  L .  ,  ^1^3  nx'  nj 

1=1  j=l 


where  *  denotes  convolution.  The  values  u^CT^)  are  then  obtained  from  q  and  X. 

In  the  figures  that  follow  we  graph  the  MPLE  f  ^  f  H  (R)  or 

n 

(2) 

fR  €  H  (  [R+)  -  solid  lines  -  against  the  underlying  density  f .  The  data 
were  generated  using  the  IMSL  random  number  generator  GGWIB.  The  X  and  Y 
samples  were  generated  consecutively  starting  with  DSEED  =  255866175.  The 
sample  sizes  are  n  =  120.  In  Figure  1,  Exponentials  E(0),  with  mean  0=1, 

have  been  censored  by  E{3);  the  number  of  uncensored  observations  is  N  =  86. 

n 

In  Figure  2,  Weibulls  W(a,0) ,  with  shape  parameter  a  =  3  and  mean  0=1, 

have  been  censored  by  W(3,2);  the  number  of  uncensored  observations  is  N^  =  105. 

J,  h  J 

Note  that  in  the  case  of  the  Exponential  density  f  £  H  (IR+)  but  f  £  H  (iR  ) 

and  as  expected  f performs  better  than  .  On  the  other  hand  in  the  case 

of  the  Weibull ,  f*5  Q  H  (CR)U  H(R  )  and  seems  to  perform  better  than  f^  . 

+  n  n 
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FIG.  I.  The  solid  lines  are  the  estimates  f  H  UR)  (top)  with  data 

(2)  n 

based  h  =  .3242,  and  f^  6  H  (ER+)  with  h  =  .5,  plotted  against  the  underlying 
f  which  is  an  E(l)  density  which  is  censored  by  an  E ( 3) .  The  data  is  120 
observations  of  which  86  are  uncensored.  Note  that  f  £  H  (£R+)  but  not  in 
H((R)  and  f  ^  performs  better  than  f  ^  . 


x  ax  I  s 


oTea  0.45  0.90  1.35  1.8)  2.2S  2.71  3.18  3.61 

X  ax  I  s 

FIG.  2.  The  solid  lines  are  the  estimates  f  €  H  (IR)  (top)  with  h  -  .275  and 

f(2)€  H  (IR  )  with  h  =  .26,  plotted  against  the  underlying  f  which  is  a  Weibull{3) 
n  + 

with  mean  1  density  which  is  censored  by  a  Weibull(3)  with  mean  2.  The  data  is 
120  observations  of  which  105  are  uncensored. 

Note  that  f*5  £  H  ( IR )  (\  H  ( IR.)  and  f{1)  seems  to  perform  better  than  fR  . 


3.  Consistenc 


Assumptions:  In  the  proofs  of  consistency  of  our  estimators,  we  make  use 
of  the  following  assumptions: 


Als  There  exists  T  >  2  such  that  E[X^J  <  ® 
Th 

A2:  /  tl-G(s)]”  dF(s)  <  +  «  . 

0 


<*>  . 


A3:  ||  Cf  S'  ||  2  “  |j  (f)2  dF  <  + 

A4:  f '  changes  sign  finitely  many  times. 


A5:  T  <  T  <  +  «  . 
F  G  - 


Assumptions  A1  -  A4  suffice  to  establish  consistency  in  probability, 
employing  the  results  of  Gill  (1983)  on  the  weak  convergence  of  the  Kaplan- 
Meier  estimator,  which  require  A2.  To  obtain  almost  sure  consistency,  we 
use  A5  in  place  of  A1  and  A2,  in  order  to  apply  the  law  of  the  iterated 
logarithm  for  the  Kaplan-Meier  estimator  due  to  Foldes  and  Rejto  (1981) . 

The  moment  condition  A1  is  used  to  obtain  upper  bounds  on  the  maximum  of 
the  uncensored  observations.  A3  arises  in  the  application  of  a  bound  on 
the  entropy-,  proved  in  section  4.  A4  arises  as  a  technical  condition  in  a 
law  of  large  numbers  for  integrals  relative  to  the  Kaplan-Meier  estimator 
proved  in  section  4. 

The  Root-Density  Estimator:  We  now  prove  a  number  of  consistency  results 

for  the  root-density  estimator  u  ,  and  establish  lower  bounds  on  the  rates 

n 

of  convergence.  The  consistency  of  the  density  estimator  f^  is  then  derived 
in  the  following  subsection  as  a  corollary  of  these  convergence  properties 
cf  u^.  The  key  Theorem  3.3  provides  consistency  of  u^  in  the  L^-norm, 
relying  on  a  series  of  lemmas  proved  in  section  4. 


We  record  the  following  two  facts: 


(i)  From  the  implicit  form  of  un,  note  that 


(3.1) 


|u!|  =  1  +  hi  ||u’|j*  =  (n/X  )  F  (+  «). 
1  n"  n  "  n"2  n  n 


(ii)  Since  is  the  maximizer  in  the  optimization  problem. 


„  X  ,  00  ,  X  _ 

2  ~  .  n  .2  n  _  .  n  ^  f  ,  _  .  2  ai,  n  .  2 


(3.2)  /  log  v  dF  -  —  h  ||v'  ||  <  J  log  u  dF - —  h  |lu’||‘ 

'o  n  nn"  "  -  •'o  3  n  n  n  n  11  n1. 


Theorem  3.3:  (a)  Under  assumptions  A1  -  A4,  and  t  > 

11  V^2  =  °p(n  d)  f°r  d  <  (I  "  T  '  *)/2  • 

(b)  Under  assumptions  A3  -  A5,  and  t  >  1/6 


II un-vH2  =  °*n  a-s*  for  d  <  (j“  t)/2* 


Proof .  By  Lemma  5.3  of  Klonias  (1982), 


•y  IO  n  OO  n 

llu  -vll  <  /  log  v  dF  -  /  log  u  dF 
"  n  "2  -  -'o  3  ■'O  n 


0 
2  * 


=  /  log  v*-  d(F-F  )  -  /  log  v*  dF  -  /  log  u2  d(F-F  )  +  f  log  dF 
Jn  n  Jn  n  ■'o  n  n  n  n 


which,  by  (3.2), 


Iq  *<*„-*)  “  /0  109  v2  d(VF)  +  (xn/n)hn  H v *H 2  "  HUD' 


and  since  by  (3.1),  X^/n  <  1, 


CO  O  A  OO  A  ^  A  A 

<  /  log  u  d(F  -F)  -  /  log  v  d(F  -F)  +  h  ((v'|f  . 
-■'o  n  n  ■'o  n  n  "  1  2 


The  conclusions  follow  from  Lemmas  4.2  and  4.7,  with  the  rate  of  convergence 
being  determined  by  Lemma  4.2.  Q 


The  following  lemma  establishes  the  rate  at  which  converges  to 
infinity,  for  use  in  the  proof  of  Lemma  3.5  below. 


A  J 

Lemma  3.4:  |  (A^/n)  -  Fr (+  °°)  [  =  0 (n  ) 

(a)  in  probability  for  d  <  (j  "  ^  ~  t^/2, 

(b)  almost  surely  for  d  <  -  t^/2, 

under  the  assumptions  of  Theorem  3.3  (a)  or  (b) ,  respectively. 


Proof:  By  (3.1), 


0  <  Pn(+  «,)  -  (An/n)  <  (An/n)h^  ||U;||* 


which  by  (3.2), 


t°°  2  ~  f00  2  A  2  ii  1.2 

<  J  log  u  dF  -  J  log  v  dF  +  h  v'l 
-  ■'0  n  n  J0  n  n  "  "2' 


so  the  conclusion  follows  exactly  as  in  Theorem  3.3. 

Next,  we  establish  the  consistency  of  our  estimate  of  the  Fisher 
information.  This  result  is  needed  to  prove  uniform  consistency  of  un  in 
Theorem  3.6. 

Lemma  3.5:  n^{fiu^| ^  0 

(a)  in  probability  for  d  <  ^  -  ~  -  2t^ 

(b)  almost  surely  for  d  <  -  2t^ 

under  the  assumptions  of  Theorem  3.3  (a)  or  (b) ,  respectively. 

Proof :  Using  (3.2),  then  (le6.6)  in  Rao  (1373), 


-13- 


-  imi  2  5  -  ii ''•ii  2 


-2  ^  r  2  *  2  A  , 

<  (n/X  )h  {/  log  u  dF  -  f  log  v  dF  } 

-  n  n  Jo  *nnJo  n 

<  (n/Xn)h^2  {/Q  log  u2  d(Fn  -  F)  -  fQ  log  v2  d(FR  -  F)}. 

The  conclusion  follows  from  Lemma  3.4  and  Lemmas  3.12  and  3.17.  Q 

He  are  now  ready  to  establish  the  uniform  consistency  of  u^. 

Theorem  3.6:  II u  -v|j  =  0(n  **) 

-  -  - . i  —  —  -  —  M  n  a* 

(a)  in  probability  for  d  <  ^  -  2t^/2,  t>  -  ~^/3 

(b)  almost  surely  for  d  <  ^  -  2t^/2,  t  >  1/6 

under  the  assumptions  of  Theorem  3.3(a)  or  (b) ,  respectively. 

Proof :  Let  u*  (x)  =  u(|x|)  for  x  €  3R-  Note  that  ||u*|j2=  2||u||2, 

||u*'||2=  2 1| u * ||  and  ||u* || ^  «  ||u|| Then,  as  in  Klonias  (1982),  for  each 
x  €  iR, 

lun(30  -  v(x)|2  =  |u*(x)  -  v*(x)|2 
i  C  II  un-v  1 1 2  ♦  h„  OKI*  ■  ahjl^'ltl 

+  ^l|v||2jh„(l|u;ll^-IMI^)thjiv^J'’. 

The  conclusion  follows  from  Theorem  3.3  and  Proposition  3.5  (taking  d  = 

Theorem  3.7:  || u *  -  v'||  =  0{n  **) 

-  "  n  ■ 2 

(a)  in  probability  for  d  <  ^ j  -  -■  -  2t^/4 

(b)  almost  surely  for  d  <  -  2t^/4 

under  the  assumptions  of  Theorem  3.3  (a)  or  (b) ,  respectively. 


t'jj 
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Proof:  By  computation,  then  integration  by  parts, 


l|u;  -  VII2  «  hf2  -  ||  v*|| ^  +  2/“  v'  (v'-u^) 

»  |M|?  -  h'\\l  -  2v*  (0)  tv  (0)  -u  (0)  ]  +  2/“  vM(u  -v) 
n  ^  z  n  q  n 

2 

which,  by  HgH^  <  ||g||2  Hg'Uj  (see  Klonias  (1981))  and  the  Cauchy-Schwartz 
inequality 

i  ll“;llj  -  Ik II 2  ♦  2  (l|v'll2l|vi2)'1  n«;-v||„  +  2||v-||2  ||  v42. 


The  Density,  Distribution,  and  Hazard  Function  Estimators: 


Using  Lemma  4.1  of  Klonias  (1982)  with  Theorems  3.3,  3.6,  and  3.7,  we 
obtain  consistency  of  the  density  estimator  in  various  norms: 

Theorem  3.8:  Under  the  assumption  of  Theorem  3.3  (a)  or  (b) 

(i)  |{ f ||  1  and  llfn~f|l2  both  converge  to  zero  in  probability  or  almost 
surely  with  the  rates  of  Theorem  3.3  (a)  or  (b)  respectively. 

(ii)  ||f  — f ||  converges  to  zero  in  probability  or  almost  surely  with  the 

II  00 

rates  of  Theorem  3.6  (a)  or  (b)  respectively. 


(iii)  ||f  — f  1|  converges  to  zero  in  probability  or  almost  surely  with  the 
n  n 


rates  d  <  ^  -  4t^/4  or  d  <  -  4t)/4  respectively. 

As  corollaries  of  Theorem  3.8,  we  establish  the  uniform  consistency 
of  the  induced  estimates  of  F  and  r.  Note  that 


and  hence,  by  the  Theorem  3.8  (i) ,  we  conclude: 


Corollary  3.9:  Under  the  assumptions  of  Theorem  3.3  (a)  or  (b) , 


IVFII.  =  > 


in  probability  or  almost  surely  with  the  values  of  d  >  0  given  in  Theorem 
3.3  (a)  or  (b)  respectively. 

For  the  proof  of  the  consistency  of  the  induced  hazard  rate  estimator, 
note  that 

|rn(t)  -  r  (t)  |  =  |  [1-Fn(t)  l**1  fn(t)  -  Cl-FttH'-W)! 

<  {  ii-f^  (t)  j  ci-f  <t)  i  }_1{||  f^-f  |L+  llfJL  OvFII«-} ' 

so  by  Corollary  3.9  and  Theorem  3.8,  we  have  the  following  results 

Corollary  3.10s  Let  I  =  [0,F  ^(l-e)]  for  a  fixed  c  >  0.  Then 

sup  |r  (t)  -  r(t)j  =  0 (n  d) 

tei 

(a)  in  probability  for  d  <  ^ j  -  —  -  2tj/2 

(b)  almost  surely  for  d  <  -  2tj/2 

under  the  assumptions  of  Theorem  3.3  (a)  or  (b) ,  respectively. 


■’  ’"T*! 


L  -A 


U4 


Lj 


4-  Auxiliary  Lemmas 


We  now  establish  bounds  for  the  integrals  in  the  proof  of  Proposition 
3.3.  We  will  use  bounds  on  the  rate  of  convergence  of  the  Kaplan-Meier 
estimator  and  on  maxima  of  i.i.d.  random  variables,  which  are  stated  in 
the  following  remarks: 

Rl:  If  TF  =  +  <o,  for  any  3  <  —  , 

iivFii~ =  °p(n~B) 

as  n  +  by  an  application  of  Theorem  2.1  of  Gill  (1983). 

R2:  If  F  and  G  are  continuous  distribution  functions  such  that 


T  <  T  <  +  °°,  then 
F  G  - 


[See  Foldes  and  Rejto  (1981).] 

R3:  If  E[X^]  <  “,  then  X  *  0(nP)  almost  surely  for  any  p  >  1/T. 
1  nn 

This  fact  follows  from  a  tail  probability  bound,  the  Borel- 
Cantelli  Lemma,  and  monotonicity  of  in  n. 

Lemma  4.1:  Assume  that  A2  holds  and  T  =  +  “>.  Then 
-  F 

nd  f*  (l-F(t)]  dt — ►  0  a.s. 
nn 

for  all  d  <  1  -  T 


Proof:  Let  W  =  Jxr  [l-F(t)J  dt.  Compute 

nn 


E  twnJ  ■  Q  [l~  U-ritn  dtjnF(s)n  *f(s)ds 


r  u-F(t)3F(t)n  dt 

0 


<  /  [l-F(t)Ie 


-n[l-F(t)l 


*\  s*  v  ,1V 


If  E(X*]  <  00 ,  there  exists  a  constant  c  such  that  l-F(t)  <  c/tT  for  all 
t  >  0,  and  thus  F  ^(l-c/tT)  <  t  for  all  t  >  0-  Splitting  the  interval  of 
integration  at  F  ^ (1-n  *log  n) ,  and  using  these  facts  to  bound  the  integrand 
on  the  resulting  intervals,  one  obtains 


•  v'X*N 
•■•-1 


M 

•WVl 


EtWnl  =  o(n1/T~1). 


Thus,  if  d  <  1  -  1/T,  by  Markov's  inequality. 


Pin"  »„  >  e]  = 


yielding  convergence  in  probability.  One  can  obtain  almost  sure  convergence 
along  a  geometric  subsequence,  and  extend  to  the  entire  sequence  by  monotonicity. 

Lemma  4.2:  (a)  Under  the  assumptions  A1  and  A2,  asn^“ 


h  n°  /  log  u2  d (F  -F)  *  0  (1)  for  d  <  £ 
n  n  n  P  2 


11 
2  “  T 


(b)  Under  assumption  A5,  asii+“ 


h  (n/log  log  n)1*  log  u2  d(F  -F)  *  0(1)  a.s. 
n  u  nn 


Proof :  By  differentiation  in  (2.2), 


(4.3) 


|u*  I  <  h  u  .  a.s. 
1  n'  -  n  n 


By  the  Cauchy-Schwartz  Inequality,  u^fx) 


l  X 

<k(*-x),u  >  and  —  <  —  <  1 
n  £  n 


as  in  DeMontricher  et  al.  (1975), 


HunL  <  ||K(--x)(|  ||un||  <  (2k(0)h~1  n/Xn)h<  (4k(0)lTV. 


Thus,  for  t  >  T  , 
-  nN 

n 


A  — I*  *t/h 

u  (t)  >  F  (+  ~)  I2h  >  eW  n, 
n  -  n  n 


•-  *.  \  %  *.  A  %  *.  *. 


\  *.  *.  \  ^ 


.*•  V- V  •  V  V* >>■* -! 

*  ^  •-V^>  «•  t*  twT  O  *.*  O  O  •*  -  .  •*  .V 
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and  for  n  sufficiently  large 


(4.4) 


-t/h  <  log  u  (t)  <0. 
n  -  n 


Note  that  X  >  T  „  ,  and  F  (t)  is  constant  for  t  >  T  .  .  Thus, 
nn  -  nN  n  —  nN 

n  n 

n  sufficiently  large, 


,+°° 


,4** 


nn 

<  2h_1  /^°  t  dF  (t) . 
-  n  'X 

nn 


nn 


|/x  log  u^(t)d(Fn-F) (t) J  »  |/x  log  u^(t)dF(t) | 


By  integration  by  parts. 


J  tdF(t)  -  -J  tdll-Fj(t) 

Xn 


nn 


J  [1-F(t)]dt  +  X  [1-F (X) ] . 
*  *  nn  nn 


nn 


Since  (f(X^)}  are  l.i.d.  Uniform  (0,1)  random  variables, 

1-F (X  )  =  0(n  1  log  n)  almost  surely, 
nn 

Combined  with  the  bounds  of  R1  and  Lemma  2.5,  this  yields 


/*  t  dF (t)  =  0(nP  1  log  n) 
nn 


for  any  p  >  1/x.  Thus 

I/jT  l0g  Un(t)  d<VF)  (t)^  =  0(hnnd) 
nn 

for  any  d  <  1  -  1/T. 

To  bound  the  integral  from  0  to  X  t  integrate  by  parts: 


for 
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|/*m  109  UJ  d(?„-F)|  -  |l°9  »>„„>  -  w.!'  - 2  C™  %  (VP>  I 

5  -"i;1  *„„  IIvHL 

■  0  (h  1nP  for  0  <  4  and  p  >  1/T, 
p  n  2 

where  the  inequality  follows  from  (4.3)  and  (4.4),  and  the  rate  from  Rl  and 
R3.  Thus  part  (a)  is  verified  for  any  d  <  ~  -  1/T. 

For  part  (b) ,  note  that  -*■  <  +  00  a.s.  as  n  -*•  **»  ,  and  the  result 

follows  from  R2.  ft 


We  now  establish  the  rate  of  convergence  of  integrals  relative  to  the 
Kaplan-Meier  estimator. 

Proposition  4.5:  Let  g:  (0,°°)  -*■  IR  be  a  differentiable  function.  Suppose 
that  g  has  M  <  80  intervals  of  increase  or  decrease,  and  that  /|g|Ydf  <  +  « 
for  some  y  >  4.  Then,  under  assumption  A2, 

lf^  9  d(VF)  =  °p(n"d>  for  d  <  2  "  y 

Under  assumption  A4,  the  convergence  above  is  with  probability  one. 

Proof :  For  each  k€  E+,  let  G^  =  {x  €  Rs  k  -  1  <  g(x)  <  k}.  Letting  m^ 

denote  the  number  of  intervals  of  increase  or  decrease  of  g  which  intersect 

mk 

G  ,  may  be  partitioned  as  G  =  U  G  ,  where  g  is  monotone  on  each  G  ,  * 

K  K  *  K,I 

For  any  increasing  sequence  of  integers  {a^}. 


1/  9d(F-f)|<  l  |f  gd(F-F)|«|f  gd(F-F)|. 

°  |k|<A„  Gk  "  u.|sw|>v  “ 


To  bound  the  sum,  integrate  by  parts  over  each 


,  "k, 

g  d (F  -F) I  =  Z  [g{x) (F  -F) (x))_  ~  /  g*  (x) {F  -F) (x)dx 

'  i=V  k,i  k,i 

\ 

<  II  VfIL  Z  (2k  +  /  |g'  (x)  |dx) 

i=l  k ,  i 

®k 

=  II  VfIL  X  (2k  +  /  g'(x)dx  ) 

n  i=l  •  Gkfi  « 

<  II  Fn~FIL  Z  (2k  +  *> 


<  M(2k+l)||Fn-F||oo. 

2-8  i  „ a 

Thus,  the  sum  is  O  (A  n  )  for  any  8  <  using  R1  to  bound  F  -F|| 

P  n  2  1  n  1  00 

Use  of  R2  provides  the  corresponding  almost  sure  result.  To  obtain 

n 

convergence  of  the  sum  to  zero,  we  choose  A  of  the  form  A  =  [n  ] ,  and  thus 

n  n 

require  T1  <  ^2. 

To  bound  the  remaining  integral,  first  show  that 


/  g  dF 

{x: |g(x) |>An) 


0 


for  n  sufficiently  large. 


For  H 


> 


t 


p(  /  |g|dF  t  0  i.o.)  =  p(3i€  U,...,N  }:  |g(T  .)  |  >  A  i.o.) 

{x:|g(x)|>An}  °  ” 

<  P(3ig  (l, . . .  ,n}  :  |g(Xn.)  I  >  An  i.o.) 

«  P(  max  | g (X . ) I  >  A  i.o.)  =  0 
l<i<n  1  "  n 

by  R3,  since  E|g(X)|^  <  ®  . 
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Thus,  wa  need  only  bound  /  g(x)  dF(x)  for  n 

{x  :|g(x)  |  >  Ar} 

sufficiently  large.  Let  F^  denote  the  distribution  function  of  | g (X^) | 
Note  that  since  E J g (X]L)  | Y  ®  ,  yY(l-Fg(y))  -*■  0  as  y  00  ,  so  for  y 
sufficiently  large,  1-Fg(y)  <  y~Y.  Integrating  by  parts  and  applying 
this  bound  twice, 

I  fl  ,  A  i|g|  dF  =  f"  y  dF  (y)  =  A  (1-F  (A  ) )  +  /"  ll-F  <y)  dy 

{x; |g(x) I  >  An>,y|  An  g'J  n  g  n  An  9 


1  2 

For  each  e  >  0,  we  can  obtain  the  rate  of  convergence  d  **  —  -  —  -  3e 
by  letting  n  =  +  e  and  0  =  ^  -  t.  Q 

The  following  result  for  continuous  distributions  is  an  analog  of  the 

result  of  Keilson  (1971)  that  a  positive  integer-valued  random  variable 
» 

with  a  finite  moment  of  any  positive  order  has  finite  entropy. 

Proposition  4.6:  Let  f  be  a  probability  density  function  such  that 
|  |f  1 1  <  +.«°  and  /*xTf  (x)dx  <  +  00  for  some  T  >  0.  Then  for  all  y  >  0, 


/  f  (x)  |  log  f(x)|Ydx  <  +  00 - 


Proof:  Note  that  x|log  x|  is  strictly  increasing  on  (0,  1/e).  Thus, 

„  -  _  Ei-  Ei  .  ,  .  -  _  .  .  -1/e.  Equivalently,  for 

given  t  >  0,  x  jlog  x  |  <  1/e  for  0  <  x  <  e 


Y  >  0, 
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■  *  '  k.  '  »  »  •  -  k  -  li  -  B  J  r_»  J-  — "  *  »  ,  «-fc 


On  D  = 


so 


I  log  xJY  <  (ee)~Y  x“£Y  for  0  <  x  <  e~1/£‘ 
{xeiR:  f  (x)  <  e_1/e}» 

f (x) J log  f(x)|Y  <  (ee)~Y  f(x)1_EYf 
/D  f f log  f|Y  <  (ee)"Y  /  fX'Cy 


Let  S  =  {xED  :  f  (x)  <  x'(T+1)}.  On  S,  f(x)_£Y  <  xfY+l)ey^ 
so 

Vl'er  5 's  +  tDSS  fWd. 

For  €  sufficiently  small,  (T+l) (1-ey)  >  1  and  (T+l)£y  <  T,  so  the 
right  side  is  finite.  Then 


/  f  |log  f|Y  <  ( | log |  | f  |  |  |  V  ~)  f  f  (x)dx  +  (e£)  Y/  f(x)1  ^dx  <+<*>. 

E  d  u 

To  obtain  consistency  for  the  estimator  u^,  we  apply  Proposition  4.5 
to  the  function  g  =  log  f.  By  Proposition  4.6,  if  e|x^J  <  “>  for  any 
6  >  0,  then  /  f  (x)  jlog  f(x)  |Ydx  <  00  for  every  y  >  0.  Noting  that 
1 1  f !  loo  <  1 1 2  as  *n  Kionias  (1981),  we  obtain  the  following 

corollary. 

Lemma  4.7;  (a)  Under  assumptions  A1-A4 

I /"log  f  d(F  -F)|  =  o  (n'd)  for  d  <  1/2. 

1  o  n  1  p 

(b)  Under  assumptions  A3-A5,  the  convergence  in  part  (a)  is  almost  sure. 
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